Skip to content

[URLSession]Fix for a timeout bug(SR-2681) #911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 10, 2017

Conversation

pushkarnk
Copy link
Member

@pushkarnk pushkarnk commented Mar 7, 2017

Currently, we use the URLSessionConfiguration.timeoutIntervalForRequest value to trigger a timeout. https://bugs.swift.org/browse/SR-2681 reported a bug in which the timeout counter is not reset on every data transfer.

This is from the API doc:

The request timeout interval controls how long (in seconds) a task should wait for additional data to
 arrive before giving up. The timer associated with this value is reset whenever new data arrives. 
When the request timer reaches the specified interval without receiving any new data, it triggers a timeout.

This patch tries to reuse the _TimeoutSource class which is currently used to trigger timeouts in MultiHandle, with some modifications. We reset the timer every time data is available.

@pushkarnk pushkarnk force-pushed the timeout-bug branch 2 times, most recently from 1293512 to f04bfc1 Compare March 7, 2017 12:24
@pushkarnk
Copy link
Member Author

With the current URLSession testing infrastructure, a test case for this, is a challenge.

@pushkarnk
Copy link
Member Author

@parkera @naithar Can you please review and comment?

@naithar
Copy link
Contributor

naithar commented Mar 7, 2017

I have no comments on implementation, except for adding guard `self` = self else { return } inside DispatchWorkerItem closure. But i'm not really sure if it actually affects anything

As for testing, if I understand the problem correctly, TestURLSessionServer initializer could receive a delay interval for response, like init(port: UInt16, delay: TimeInterval? = nil) which could be used by .readAndRespond() method to delay actual response.
Here's the changes for TestURLSessionServer class and runServer(...) method that seem to correctly implement the timeout testing (I've changed nil default to 5 to test it): https://gist.github.com/naithar/7f26c24dd08c312858d13d4f5e97f728

@naithar
Copy link
Contributor

naithar commented Mar 7, 2017

Oh, now I see that i didn't understand the problem with tests correctly.

@pushkarnk
Copy link
Member Author

Thanks @naithar I have added the guard

@naithar
Copy link
Contributor

naithar commented Mar 7, 2017

@pushkarnk
Again about testing. Seems like i've started digging in a right place actually.
If we can change a func writeData(data: String) method to split resulted string in chunks and send it with a delay this could emulate the request to http://httpbin.org/drip?duration=10&numbytes=10&code=200 which is used in bug report. But currently i managed only to send multiple data parameter in a single response.

var bytes = Array(data.utf8)


for _ in 0..<5 {
    sleep(1)
    _  = try? attempt("write", valid: isNotNegative, CInt(write(connectionSocket, &bytes, bytes.count)))
}

@parkera
Copy link
Contributor

parkera commented Mar 7, 2017

The diff looks reasonable but we should be able to get some kind of unit test in place for this.

@pushkarnk
Copy link
Member Author

pushkarnk commented Mar 8, 2017

@naithar Thanks for your work. I will try out your suggestion. I was hoping that we can also delay the response as in:

    public func respond(with response: _HTTPResponse, delay: Int = 0) throws {
        let semaphore = DispatchSemaphore(value: 0)
        let deadlineTime = DispatchTime.now() + .seconds(delay)
        DispatchQueue.main.asyncAfter(deadline: deadlineTime) {
            do {
                try self.socket.writeData(data: response.description)
                semaphore.signal()
            } catch { }
        }
        semaphore.wait()
     }

I did this and I see a crash. Investigating it now.

@naithar
Copy link
Contributor

naithar commented Mar 9, 2017

Does crash message anything? Maybe "Trying to access a behaviour for a task that in not in the registry."? I can sometimes reproduce this by just delaying response for long enough.

This sometimes happens, because .taskCompleted state is set before .transferFailed, so task is no longer in registry when .completeTask(...) method is called.

@naithar
Copy link
Contributor

naithar commented Mar 9, 2017

Not sure what happened, but reapplying a PR to current master removed the crash.
This commit implements a chunked response, so this PR could be tested. It's ugly but it works.

@pushkarnk
Copy link
Member Author

pushkarnk commented Mar 9, 2017

@naithar Yes, I did see that fatal error. Nevertheless, your patch looks good and the test runs well. I am wondering how we go about. I'd suggest we have this PR merged first and then you create a PR with your patch.

@pushkarnk
Copy link
Member Author

I also realised that I need to cancel the timer when completeTask() is called.

@naithar
Copy link
Contributor

naithar commented Mar 9, 2017

@pushkarnk i can also create a PR with test commented out. Actually i'm not agains you taking this test implementation into this PR.

@pushkarnk
Copy link
Member Author

You creating the PR with calls to the new tests commented out is a better idea, in my opinion :)

@naithar
Copy link
Contributor

naithar commented Mar 9, 2017

Then I'll do it now.

@naithar
Copy link
Contributor

naithar commented Mar 9, 2017

Submitted #913

@pushkarnk pushkarnk force-pushed the timeout-bug branch 2 times, most recently from 5b5bc08 to d4e8ecd Compare March 9, 2017 10:03
@pushkarnk
Copy link
Member Author

pushkarnk commented Mar 10, 2017

I can consistently see the "Trying to access a behaviour for a task that in not in the registry." failure with @naithar 's test case from #913 . Investigating. It is probably related to the previous test_cancelTask test and the fact that we use the same URLSession object.

@naithar
Copy link
Contributor

naithar commented Mar 10, 2017

That's strange.
I can see same failure. But moving test_taskTimeout test to the top of the tests array actually solves the issue.

I'll look into it a little more.

@pushkarnk
Copy link
Member Author

Yes, looks like we need to cancel the timeoutTimer in complete(withError:) too.

@naithar
Copy link
Contributor

naithar commented Mar 10, 2017

Actually test_taskTimeout causes failure only if it follows test_cancelTask test. Atm I'm not sure what exactly causing the failure.

Also moving test_cancelTask to the top causes test_dataTaskWithURL fail for the same reason as timeout test - Trying to access a behaviour for a task that in not in the registry

@naithar
Copy link
Contributor

naithar commented Mar 10, 2017

Running Cancel test actually causes this error for TestURLSessionServer:

bind failed: Unknown error: -1 (-1)
  - operation : "bind"
  - errno : -1
  - file : "HTTPServer.swift"
  - line : 69

It's not displayed or handled because server is starting with this code (i guess the problem is in XCTAssertTrue(true)), so it won't show anything:

globalDispatchQueue.async {
            do {
                try self.runServer(with: serverReady)
            } catch {
                XCTAssertTrue(true)
                return
            }
        }

Also not signaling semaphore in catch in runServer(...) method causes tests to hang sometimes.

@pushkarnk I see several workarounds:

  • Make cancel test run last.
  • Disable server start for cancel test (I guess server just hangs and waits for request, which never happen in test_cancelTask)
  • Somehow change TestURLSessionServer or runServer(...) so it won't hang.

@naithar
Copy link
Contributor

naithar commented Mar 10, 2017

Switching

} catch let e as ServerError {
    if e.operation != "bind" { continue }

with

} catch let e as ServerError {
    if e.operation == "bind" { continue }

in runServer(...) method seems like a workaround and removes the error and crashes.

@pushkarnk
Copy link
Member Author

Thanks @naithar That seems like a fix (not a workaround). We retry on a bind failure and throw an error otherwise.

runServer must retry on a bind failure - thanks to Sergey (@naithar)
@pushkarnk
Copy link
Member Author

@swift-ci please test

@pushkarnk
Copy link
Member Author

@swift-ci please test and merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants